---
title: "Download State Health Agency and CDC/FDA Tweets from 2012 through 2022"
author:
  - Samuel R. Mendez, Harvard T.H. Chan School of Public Health
date: "July 13, 2023"
output:
  html_document:
    theme: readable
    toc: true
    number_sections: true
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Read Me
Read important info before you run the code in this file or try to interpret the outputs.

### Purpose
This RMD file is meant to enhance the reproducibility and the transparency of the work I undertook as part of my doctoral dissertation in Population Health Sciences, in the Social and Behavioral Sciences department at the Harvard T.H. Chan School of Public Health. The broad goal of my dissertation work was to broaden the scope of health literacy research methods by integrating natural language processing techniques and media studies theoretical frameworks. This file describes a reproducible process of 

### Assumptions
Note that in this RMD file, I make a few key assumptions. I assume you have the required packages installed in your working environment, or that your setup is compatible. I assume you have access to Twitter's Academic Research API (v2), or a similar product, to reproduce my queries. (Note at the time of updating this doc on July 13, 2023, the Academic Research API appears to be defunct. However, there was no official announcement and I did not receive any notifications announcing a specific end to academic access.)

### License
In the spirit of open science, this file is available under a CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. In short, this means you can copy, modify, and distribute the work for any purpose without asking permission. Though not required, for use of this public domain material, I kindly ask that you cite the [work as published on Harvard Dataverse](https://doi.org/10.7910/DVN/VX4HK8).

For full license information, see the 
[Creative Commons CC0 1.0 License Page](https://creativecommons.org/publicdomain/zero/1.0/)

### Dependencies
This RMD was created using R (Version 4.2.3) and RStudio 2022.12.0+353 "Elsbeth Geranium" Release for Windows.
```{r dependencies, message=FALSE, warning=FALSE, eval=FALSE}
#dplyr (version 1.1.1)
require("dplyr")
#dplyr (version 0.3.1)
require("academictwitteR")
```

## Read About the Twitter Data
Read about the process I used to download the Twitter data for my dissertation in this section.

### Intentions with Data Download
This data set forms the basis of two of my dissertation papers, focused on integrating natural language processing and health literacy research methods. As I wanted flexibility to expand the scope of my intended dissertation work, I included Tweets for all of 2012 through 2022 in this download. I included Twitter accounts for all state public health agencies in this data set, as well as official governmental accounts created specifically for state-level COVID-19 updates. I also included accounts I deemed relevant to COVID-19 communication from the CDC and the FDA. This forms a much larger data  set than I use in my dissertation, and the steps to filter and clean this dataset for specific tasks in my dissertation are described elsewhere.

### Caveats
As these data were downloaded retroactively, there is a chance that state agencies deleted some of their Tweets since their original publication. Thus it is possible that some data were not available at time of download on April 9, 2023. However, that seems to be a minimal concern given the volume and consistency of Tweets available across states.

It is also worth noting that Utah, Nevada, and Connecticut all created separate Twitter accounts to communicate about their COVID-19 response. The queries below include the state public health agency account of Nevada and Connecticut, as well as their COVID-19 response accounts. The same principle applies to Utah, but is complicated by its Department of Health and its Department of Human Services merging into one Department of Health and Human Services on July 1, 2022. The Twitter account formerly belonging to the Utah Department of Health was renamed to reflect new management under the DHHS, with prior Tweets preserved. As such, those prior Tweets appear under the UtahDHHS account even though the department did not exist during 2020 or 2021. I made the decision to include both accounts as early COVID-19 communications were taking place before the creation of such pandemic-specific accounts, and both kinds of account would operate as an authoritative voice in state public health.

In the inverse situation of Utah, the Alaska Department of Health and Social Services restructured to split into two during a gradual process in 2022. The split was announced in an executive order on March 19, 2022, and the two departments were legally operating as separate entities on July 1, 2022. The account formerly belonging to the DHSS was renamed to reflect new management under the Department of Health on Twitter. As such, the previous Tweets from the DHSS were still available under the Alaska DOH account, even though that department did not exist during 2020 or 2021.

The NCPublicHealth account from the North Carolina Division of Public Health was officially announced as no longer in use on May 6, 2022, via a pinned Tweet. The pinned Tweet directs viewers to the NCDHHS account. The below queries include both accounts, as the Tweets from both accounts were available at time of download, and both accounts were in use during the period of 2020 and 2021.

Maine is also represented in this data set via 2 Twitter accounts: a state CDC account and a state DHS account. This was the one case where I determined that more than one department in a state communicated about COVID-19 with authority in public health.

Although the Wyoming Department of Health Twitter account still exists, it was only in 2018 and 2019, infrequently.

Finally, on July 13, 2023, I intended to add a query for the Montana account @health406 which is associated with the state's "Health in the 406" newsletter and communication center. As of July 13, 2023, the account description reads, "Health in the 406 is a regular communication by DPHHS on a variety of wellness topics that impact the daily lives of Montanans." It was originally not included in this dataset, which was skewed towards COVID-19 communication in the years of the pandemic. It was intended to be added as a reflection of the fact that it is a regular health information source from the state government. However, by this point, the academic research tier of the Twitter API was no longer available. This account was created in October 2019, and as of July 13, 2013, it is listed on Twitter having 2,840 Tweets. A manual search on Twitter on that same date revealed its earliest Tweet to be on October 10, 2019. This amounts to an average of 2.8 Tweets per day for this account since its creation. Assuming a steady rate of Tweets over time, this amounts to around 2300 Tweets from this account during the study period (2012 through 2022). I used the Twitter website's advanced search function to retrieve Tweets from this account between March 1, 2020, and April 30, 2020. This search revealed 70 Tweets, 26 of which I determined to be pandemic-related. Note that the overall size of this dataset is n=690281, with n=599440 Tweets from state health agencies. The Montana DHHS account included in this dataset has n=1051 Tweets during the study period.

Summary information about every account included in this data set is in the file "twitterHandles.xlsx" on this dataset's Harvard Dataverse entry.

## Create Dataframe
Use the code in this section to recreate the merged data frame containing all of the Tweet data I downloaded based on the above explanations.

### Download Data
Here, I downloaded the data through a series of API calls. I divided them into smaller chunks to make recovering from internet disruptions a little easier. For all of the calls I picked a value of n=200000 as an arbitrarily large number that I knew would be larger than the number of Tweets from each set of users during each time period. This can be confirmed by the fact that all queries returned outputs smaller than n=200000.
```{r APICall00, eval=FALSE, message=FALSE, tidy=TRUE, results="hide"}
#Define user list
users00<-c("ADHPIO",
          "Alaska_DOH",
          "ALPublicHealth",
          "AZDHS",
          "CAPublicHealth",
          "CDC_eHealth",
          "CDCemergency",
          "CDCFlu",
          "CDCgov",
          "CDCHealthEquity")
#Make API calls with user list
query00 <- get_all_tweets(users = users00,
                      	start_tweets = "2011-12-31T00:00:00Z",
                      	end_tweets = "2014-12-31T18:00:00Z",
                      	n = 200000, data_path = "query00")
query01 <- get_all_tweets(users = users00,
                      	start_tweets = "2014-12-31T00:00:00Z",
                      	end_tweets = "2017-12-31T18:00:00Z",
                      	n = 200000, data_path = "query01")
query02 <- get_all_tweets(users = users00,
                      	start_tweets = "2017-12-31T00:00:00Z",
                      	end_tweets = "2022-12-31T18:00:00Z",
                      	n = 200000, data_path = "query02")
```

Here, I backed up the individual query results in RDA format for myself.
```{r exportTweetQueries00, eval=FALSE, tidy=TRUE}
#At time of authoring, n=28711
saveRDS(query00, file=("query00.Rdata"))
#At time of authoring, n=30982
saveRDS(query01, file=("query01.Rdata"))
#At time of authoring, n=61391
saveRDS(query02, file=("query02.Rdata"))
```

And then I repeated this process for all of the other accounts. A function would have been helpful for this repetitive process, but I found it helpful to have this process written out explicitly.
```{r APICall01, eval=FALSE, message=FALSE, tidy=TRUE, results="hide"}
#Define user list
users01<-c("CDPHE",
          "Covid19Ct",
          "CTDPH",
          "Delaware_DHSS",
          "DhhsNevada",
          "DHSWI",
          "DPHHSMT",
          "FDA_Global",
          "FDAHealthEquity",
          "FDAMedia")
#Make API calls with user list
query03 <- get_all_tweets(users = users01,
                      	start_tweets = "2011-12-31T00:00:00Z",
                      	end_tweets = "2014-12-31T18:00:00Z",
                      	n = 200000, data_path = "query03")
query04 <- get_all_tweets(users = users01,
                      	start_tweets = "2014-12-31T00:00:00Z",
                      	end_tweets = "2017-12-31T18:00:00Z",
                      	n = 200000, data_path = "query04")
query05 <- get_all_tweets(users = users01,
                      	start_tweets = "2017-12-31T00:00:00Z",
                      	end_tweets = "2022-12-31T18:00:00Z",
                      	n = 200000, data_path = "query05")
```

```{r exportTweetQueries01, eval=FALSE, tidy=TRUE}
#At time of authoring, n=20246
saveRDS(query03, file=("query03.Rdata"))
#At time of authoring, n=18975
saveRDS(query04, file=("query04.Rdata"))
#At time of authoring, n=37005
saveRDS(query05, file=("query05.Rdata"))
```

```{r APICall02, eval=FALSE, message=FALSE, tidy=TRUE, results="hide"}
#Define user list
users02<-c("GaDPH",
          "HawaiiDOH",
          "Health_wyoming",
          "HealthNYGov",
          "healthvermont",
          "HealthyFla",
          "HealthyLivingMo",
          "HealthyOklahoma",
          "Hhsndgov",
          "IowaHHS",
          "IDHW")
#Make API calls with user list
query06 <- get_all_tweets(users = users02,
                      	start_tweets = "2011-12-31T00:00:00Z",
                      	end_tweets = "2014-12-31T18:00:00Z",
                      	n = 200000, data_path = "query06")
query07 <- get_all_tweets(users = users02,
                      	start_tweets = "2014-12-31T00:00:00Z",
                      	end_tweets = "2017-12-31T18:00:00Z",
                      	n = 200000, data_path = "query07")
query08 <- get_all_tweets(users = users02,
                      	start_tweets = "2017-12-31T00:00:00Z",
                      	end_tweets = "2022-12-31T18:00:00Z",
                      	n = 200000, data_path = "query08")
```

```{r exportTweetQueries02, eval=FALSE, tidy=TRUE}
#At time of authoring, n=16970
saveRDS(query06, file=("query06.Rdata"))
#At time of authoring, n=32270
saveRDS(query07, file=("query07.Rdata"))
#At time of authoring, n=59061
saveRDS(query08, file=("query08.Rdata"))
```

```{r APICall03, eval=FALSE, message=FALSE, tidy=TRUE, results="hide"}
#Define user list
users03<-c("IDPH",
          "KDHE",
          "KYHealthAlerts",
          "LADeptHealth",
          "mainedhhs",
          "MassDPH",
          "MDHealthDept",
          "MEPublicHealth",
          "MichiganHHS",
          "mnhealth")
#Make API calls with user list
query09 <- get_all_tweets(users = users03,
                      	start_tweets = "2011-12-31T00:00:00Z",
                      	end_tweets = "2014-12-31T18:00:00Z",
                      	n = 200000, data_path = "query09")
query10 <- get_all_tweets(users = users03,
                      	start_tweets = "2014-12-31T00:00:00Z",
                      	end_tweets = "2017-12-31T18:00:00Z",
                      	n = 200000, data_path = "query10")
query11 <- get_all_tweets(users = users03,
                      	start_tweets = "2017-12-31T00:00:00Z",
                      	end_tweets = "2022-12-31T18:00:00Z",
                      	n = 200000, data_path = "query11")
```

```{r exportTweetQueries03, eval=FALSE, tidy=TRUE}
#At time of authoring, n=18467
saveRDS(query09, file=("query09.Rdata"))
#At time of authoring, n=22274
saveRDS(query10, file=("query10.Rdata"))
#At time of authoring, n=64212
saveRDS(query11, file=("query11.Rdata"))
```

```{r APICall04, eval=FALSE, message=FALSE, tidy=TRUE, results="hide"}
#Define user list
users04<-c("msdh",
            "NCDHHS",
            "NEDHHS",
            "NHPubHealth",
            "NJDeptofHealth",
            "NMDOH",
            "NCPublicHealth",
            "NVHealthRespon1",
            "OHAOregon",
            "OHdeptofhealth")
#Make API calls with user list
query12 <- get_all_tweets(users = users04,
                      	start_tweets = "2011-12-31T00:00:00Z",
                      	end_tweets = "2014-12-31T18:00:00Z",
                      	n = 200000, data_path = "query12")
query13 <- get_all_tweets(users = users04,
                      	start_tweets = "2014-12-31T00:00:00Z",
                      	end_tweets = "2017-12-31T18:00:00Z",
                      	n = 200000, data_path = "query13")
query14 <- get_all_tweets(users = users04,
                      	start_tweets = "2017-12-31T00:00:00Z",
                      	end_tweets = "2022-12-31T18:00:00Z",
                      	n = 200000, data_path = "query14")
```

```{r exportTweetQueries04, eval=FALSE, tidy=TRUE}
#At time of authoring, n=12269
saveRDS(query12, file=("query12.Rdata"))
#At time of authoring, n=20586
saveRDS(query13, file=("query13.Rdata"))
#At time of authoring, n=70692
saveRDS(query14, file=("query14.Rdata"))
```

```{r APICall05, eval=FALSE, message=FALSE, tidy=TRUE, results="hide"}
#Define user list
users05<-c("PAHealthDept",
            "RIHEALTH",
            "SCDHEC",
            "SDDOH",
            "StateHealthIN",
            "TexasDSHS",
            "TNDeptofHealth",
            "VDHgov",
            "WADeptHealth",
            "WV_DHHR")
#Make API calls with user list
query15 <- get_all_tweets(users = users05,
                      	start_tweets = "2011-12-31T00:00:00Z",
                      	end_tweets = "2014-12-31T18:00:00Z",
                      	n = 200000, data_path = "query15")
query16 <- get_all_tweets(users = users05,
                      	start_tweets = "2014-12-31T00:00:00Z",
                      	end_tweets = "2017-12-31T18:00:00Z",
                      	n = 200000, data_path = "query16")
query17 <- get_all_tweets(users = users05,
                      	start_tweets = "2017-12-31T00:00:00Z",
                      	end_tweets = "2022-12-31T18:00:00Z",
                      	n = 200000, data_path = "query17")
```

```{r exportTweetQueries05, eval=FALSE, tidy=TRUE}
#At time of authoring, n=8847
saveRDS(query15, file=("query15.Rdata"))
#At time of authoring, n=34776
saveRDS(query16, file=("query16.Rdata"))
#At time of authoring, n=106838
saveRDS(query17, file=("query17.Rdata"))
```

```{r APICall06, eval=FALSE, message=FALSE, tidy=TRUE, results="hide"}
#Define user list
users06<-c("US_FDA",
            "UtahCoronavirus",
            "UtahDHHS")
#Make API calls with user list
query18 <- get_all_tweets(users = users06,
                      	start_tweets = "2011-12-31T00:00:00Z",
                      	end_tweets = "2014-12-31T18:00:00Z",
                      	n = 200000, data_path = "query18")
query19 <- get_all_tweets(users = users06,
                      	start_tweets = "2014-12-31T00:00:00Z",
                      	end_tweets = "2017-12-31T18:00:00Z",
                      	n = 200000, data_path = "query19")
query20 <- get_all_tweets(users = users06,
                      	start_tweets = "2017-12-31T00:00:00Z",
                      	end_tweets = "2022-12-31T18:00:00Z",
                      	n = 200000, data_path = "query20")
```

```{r exportTweetQueries06, eval=FALSE, tidy=TRUE}
#At time of authoring, n=2818
saveRDS(query18, file=("query18.Rdata"))
#At time of authoring, n=6018
saveRDS(query19, file=("query19.Rdata"))
#At time of authoring, n=16873
saveRDS(query20, file=("query20.Rdata"))
```

### Merge Data
I wanted to merge the results from the above queries into a single data frame. Most of the smaller data frames have the same 14 columns. Query14 is the only data frame to contain an extra column: "withheld". Only one Tweet had data in this column. It was a Retweet of basketball player promoting an Ugly Holiday Mask partnership with a state governor. I decided to drop the "withheld" column since this was the only case of a withheld Tweet, and it was not withheld in the US.
```{r dropColumnFromQuery14, tidy=TRUE, eval=FALSE}
#Check if query01 has columns the other DFs do not.
query14 <- within(query14,rm(withheld))
```

Query12 and query20 were the only data frames to not have the column "geo". Geo data was not very common in the data frames, but at this stage it seemed safer to keep that data anyway. So I added a column named "geo" to both DFs, with NA values.
```{r addColumnToQuery12and20, tidy=TRUE, eval=FALSE}
#Set to NA to match the instances in other DFs where geo data is missing
query12$geo <- query20$geo <- NA
```

I wanted to assign all of the observations their own unique row names before merging them into a single DF. Again, a function probably would have helped here.
```{r assignRowNames, message=FALSE, tidy=TRUE, eval=FALSE}
#NULL out the row names, as a treat
rownames(query01)<-rownames(query02)<-rownames(query03)<-
  rownames(query04)<-rownames(query05)<-rownames(query06)<-rownames(query07)<-
  rownames(query08)<-rownames(query09)<-rownames(query10)<-rownames(query11)<-
  rownames(query12)<-rownames(query13)<-rownames(query14)<-rownames(query15)<-NULL

#Set variables equal to length of each df for ease of reference
len_query00<-as.integer(nrow(query00))
len_query01<-as.integer(nrow(query01))
len_query02<-as.integer(nrow(query02))
len_query03<-as.integer(nrow(query03))
len_query04<-as.integer(nrow(query04))
len_query05<-as.integer(nrow(query05))
len_query06<-as.integer(nrow(query06))
len_query07<-as.integer(nrow(query07))
len_query08<-as.integer(nrow(query08))
len_query09<-as.integer(nrow(query09))
len_query10<-as.integer(nrow(query10))
len_query11<-as.integer(nrow(query11))
len_query12<-as.integer(nrow(query12))
len_query13<-as.integer(nrow(query13))
len_query14<-as.integer(nrow(query14))
len_query15<-as.integer(nrow(query15))
len_query16<-as.integer(nrow(query16))
len_query17<-as.integer(nrow(query17))
len_query18<-as.integer(nrow(query18))
len_query19<-as.integer(nrow(query19))
len_query20<-as.integer(nrow(query20))

#Assign rownames sequentially, using offsets to end up with unique row names
rownames(query00)<-seq(1,len_query00)
#Start of number sequence
rownames(query01)<-seq(len_query00+1,
                       #end of sequence
                       len_query00+len_query01)
rownames(query02)<-seq(len_query00+len_query01+1,
                       #end of sequence
                       len_query00+len_query01+len_query02)
rownames(query03)<-
  #Start of sequence
  seq(len_query00+len_query01+len_query02+1,
      #end of sequence
      len_query00+len_query01+len_query02+len_query03)
rownames(query04)<-
  #Start of sequence
  seq(len_query00+len_query01+len_query02+len_query03+1,
      #end of sequence
      len_query00+len_query01+len_query02+len_query03+len_query04)
rownames(query05)<-
  #Start of sequence
  seq(len_query00+len_query01+len_query02+len_query03+len_query04+1,
      #end of sequence
      len_query00+len_query01+len_query02+len_query03+len_query04+len_query05)
rownames(query06)<-
  #Start of sequence
  seq(len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+1,
      #end of sequence
      len_query00+len_query01+len_query02+len_query03+len_query04+len_query05
      +len_query06)
rownames(query07)<-
  #Start of sequence
  seq(len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+1,
      #end of sequence
      len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07)
rownames(query08)<-
  #Start of sequence
  seq(len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+1,
      #end of sequence
      len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08)
rownames(query09)<-
  #Start of sequence
  seq(len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+1,
      #end of sequence
      len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09)
rownames(query10)<-
  #Start of sequence
  seq(len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+1,
      #end of sequence
      len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10)
rownames(query11)<-
  #Start of sequence
  seq(len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+1,
      #end of sequence
      len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11)
rownames(query12)<-
  #Start of sequence
  seq(len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      1,
      #end of sequence
      len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      len_query12)
rownames(query13)<-
  #Start of Sequence
  seq(len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      len_query12+1,
      #End of sequence
      len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      len_query12+len_query13)
rownames(query14)<-
  #Start of sequence
  seq(len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      len_query12+len_query13+1,
      #End of sequence
      len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      len_query12+len_query13+len_query14)
rownames(query15)<-
  #Start of sequence
  seq(len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      len_query12+len_query13+len_query14+1,
      #End of sequence
      len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      len_query12+len_query13+len_query14+len_query15)
rownames(query16)<-
  #Start of sequence
  seq(len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      len_query12+len_query13+len_query14+len_query15+1,
      #End of sequence
      len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      len_query12+len_query13+len_query14+len_query15+len_query16)
rownames(query17)<-
  #Start of sequence
  seq(len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      len_query12+len_query13+len_query14+len_query15+len_query16+1,
      #End of sequence
      len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      len_query12+len_query13+len_query14+len_query15+len_query16+len_query17)
rownames(query18)<-
  #Start of sequence
  seq(len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      len_query12+len_query13+len_query14+len_query15+len_query16+len_query17+1,
      #End of sequence
      len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      len_query12+len_query13+len_query14+len_query15+len_query16+len_query17+
      len_query18)
rownames(query19)<-
  #Start of sequence
  seq(len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      len_query12+len_query13+len_query14+len_query15+len_query16+len_query17+
      len_query18+1,
      #End of sequence
      len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      len_query12+len_query13+len_query14+len_query15+len_query16+len_query17+
      len_query18+len_query19)
rownames(query20)<-
  #Start of sequence
  seq(len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      len_query12+len_query13+len_query14+len_query15+len_query16+len_query17+
      len_query18+len_query19+1,
      #End of sequence
      len_query00+len_query01+len_query02+len_query03+len_query04+len_query05+
      len_query06+len_query07+len_query08+len_query09+len_query10+len_query11+
      len_query12+len_query13+len_query14+len_query15+len_query16+len_query17+
      len_query18+len_query19+len_query20)
```

With the data frames containing matching columns and unique row names, I used the default settings in rbind() to combine the data frames' observations into one data frame. The resulting data frame had 690,281 observations
```{r mergeImports, message=FALSE, tidy=TRUE, eval=FALSE}
df_allTweets <-
  dplyr::bind_rows(query00,query01,query02,query03,query04,
                   query05,query06,query07,query08,query09,
                   query10,query11,query12,query13,query14,
                   query15,query16,query17,query18,query19,query20)
```

As noted above, some of the states are represented in this data set by multiple Twitter accounts. I wanted to add a custom "state" variable to facilitate later analysis. I also wanted to use this variable to indicate a CDC- or FDA-associated account. I started by extracting the author IDs to a character vector for use with the academictwitteR package.
```{r extractAuthorId, tidy=TRUE, message=FALSE, results='hide', eval=FALSE}
#Make a data frame containing only the author_id column
authorIds<-df_allTweets[c("author_id")]
#Turn the data frame into a character vector containing each ID only once
authorIds<-authorIds[!duplicated(authorIds),]
#Retrieve profile info using the character vector
df_authorProfiles <- get_user_profile(authorIds, bearer_token = get_bearer())
#Make a minimal version of the dataset
df_authorProfiles<-df_authorProfiles[c("id","name","username")]
```

From here, I don't share the code I used to add a custom "state" column to df_allTweets, as I don't want to go against Twitter policy by sharing author_id information. To summarize the excluded code, I used a case_when statement to assign the value of a new "char_state" column to a two-letter postal state abbreviation if the author was a public health agency serving a specific state. If the author was a CDC or FDA account, I assigned a value of "CDC" or "FDA". I hard-coded these values into the case_when statement. Thus sharing the code would violate Twitter policy by revealing author_id information, which is not publicly accessible outside of the API. To illustrate, I have included an example code snippet below that will not evaluate when knit.
```{r example_DO_NOT_RUN, eval=FALSE, tidy=TRUE}
df_allTweets <- df_allTweets %>% mutate(state = case_when(
  #FDA Global
  author_id=="xxxx"~"FDA",
  #State of Utah COVID-19 Response
  author_id=="xxxx"~"UT",
  #CDC Emergency
  author_id=="xxxx"~"CDC",
  #NevadaDHHS
  author_id=="xxxx"~"NV"
))
```

### Data Frame Export
At this stage, I considered my basic organization done. df_allTweets represents my raw data, which I further clean and refine in other RMD files for specific analyses in my dissertation. My last step here was to export the newly created df for myself, as well as a TXT file of Tweet IDs for others to hydrate.
```{r exportDF00, tidy=TRUE, eval=FALSE}
#Note use of saveRDS(). Restore object with readRDS()
saveRDS(df_allTweets, file("df_allTweets.Rdata"))
tweetIDs <- (df_allTweets$id)
```
```{r exportDF01, tidy=TRUE, eval=FALSE}
#Export IDs of pilot data Tweets
write.table(
  tweetIDs,
  "publicHealthTweetIDs.txt",
  #Create a txt file ready for hydration, no extraneous characters
  quote=FALSE,row.names=FALSE,col.names=FALSE
)
```

Here, I also wanted to export my custom "state" variable alongside the Tweet IDs for others to work with in R.
```{r exportDF02, tidy=TRUE, eval=FALSE}
df_allTweets_sharable <- select(df_allTweets, c(id,state))
saveRDS(df_allTweets_sharable, file("df_allTweets_sharable.Rdata"))
write.csv(df_allTweets_sharable,file="df_allTweets_sharable.csv",row.names=FALSE)
```
